Search CORE

268 research outputs found

DeepKey: Towards End-to-End Physical Key Replication From a Single Photograph

Author: Alex Krizhevsky
EH Adelson
Jürgen Schmidhuber
Kaiming He
S Ren
Publication venue
Publication date: 04/11/2018
Field of study

This paper describes DeepKey, an end-to-end deep neural architecture capable of taking a digital RGB image of an 'everyday' scene containing a pin tumbler key (e.g. lying on a table or carpet) and fully automatically inferring a printable 3D key model. We report on the key detection performance and describe how candidates can be transformed into physical prints. We show an example opening a real-world lock. Our system is described in detail, providing a breakdown of all components including key detection, pose normalisation, bitting segmentation and 3D model inference. We provide an in-depth evaluation and conclude by reflecting on limitations, applications, potential security risks and societal impact. We contribute the DeepKey Datasets of 5, 300+ images covering a few test keys with bounding boxes, pose and unaligned mask data.Comment: 14 pages, 12 figure

arXiv.org e-Print Archive

Crossref

Explore Bristol Research

Depth Estimation Through a Generative Model of Light Field Synthesis

Author: C Zhu
D Cho
EH Adelson
RC Bolles
S Heber
S Heber
S Wanner
S Wanner
TE Bishop
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/09/2016
Field of study

Light field photography captures rich structural information that may facilitate a number of traditional image processing and computer vision tasks. A crucial ingredient in such endeavors is accurate depth recovery. We present a novel framework that allows the recovery of a high quality continuous depth map from light field data. To this end we propose a generative model of a light field that is fully parametrized by its corresponding depth map. The model allows for the integration of powerful regularization techniques such as a non-local means prior, facilitating accurate depth map estimation.Comment: German Conference on Pattern Recognition (GCPR) 201

arXiv.org e-Print Archive

Crossref

Recommended from our members

Room reflections and constancy in speech-like sounds: within-band effects

Author: AJ Watkins
AJ Watkins
B Roberts
BR Glasberg
EH Adelson
HR Cooper
ISO 3382
O Crouzet
RV Shannon
SE Palmer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The experiment asks whether constancy in hearing precedes or follows grouping. Listeners heard speech-like sounds comprising 8 auditory-filter shaped noise-bands that had temporal envelopes corresponding to those arising in these filters when a speech message is played. The „context‟ words in the message were “next you‟ll get _to click on”, into which a “sir” or “stir” test word was inserted. These test words were from an 11-step continuum that was formed by amplitude modulation. Listeners identified the test words appropriately and quite consistently, even though they had the „robotic‟ quality typical of this type of 8-band speech. The speech-like effects of these sounds appears to be a consequence of auditory grouping. Constancy was assessed by comparing the influence of room reflections on the test word across conditions where the context had either the same level of reflections, or where it had a much lower level. Constancy effects were obtained with these 8-band sounds, but only in „matched‟ conditions, where the room reflections were in the same bands in both the context and the test word. This was not the case in a comparison „mismatched‟ condition, and here, no constancy effects were found. It would appear that this type of constancy in hearing precedes the across-channel grouping whose effects are so apparent in these sounds. This result is discussed in terms of the ubiquity of grouping across different levels of representation

Central Archive at the University of Reading

Crossref

Baseline and triangulation geometry in a standard plenoptic camera

Author: AF Bobick
Amar Aggoun
C Wheatstone
Christopher Hahne
EH Adelson
M Levoy
Matthias Pesch
MW Tao
Susanne Fiebig
Vladan Velisavljevic
W Burger
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/08/2017
Field of study

In this paper, we demonstrate light field triangulation to determine depth distances and baselines in a plenoptic camera. The advancement of micro lenses and image sensors enabled plenoptic cameras to capture a scene from different viewpoints with sufficient spatial resolution. While object distances can be inferred from disparities in a stereo viewpoint pair using triangulation, this concept remains ambiguous when applied in case of plenoptic cameras. We present a geometrical light field model allowing the triangulation to be applied to a plenoptic camera in order to predict object distances or to specify baselines as desired. It is shown that distance estimates from our novel method match those of real objects placed in front of the camera. Additional benchmark tests with an optical design software further validate the model’s accuracy with deviations of less than 0:33 % for several main lens types and focus settings. A variety of applications in the automotive and robotics field can benefit from this estimation model

arXiv.org e-Print Archive

Crossref

Wolverhampton Intellectual Repository and E-theses

University of Bedfordshire Repository

Cortical Maps

Author: Adelson EH
James A. Bednar
Stuart P. Wilson
Publication venue: 'SAGE Publications'
Publication date: 19/08/2015
Field of study

In this article, we review functional organization in sensory cortical regions-how the cortex represents the world. We consider four interrelated aspects of cortical organization: (1) the set of receptive fields of individual cortical sensory neurons, (2) how lateral interaction between cortical neurons reflects the similarity of their receptive fields, (3) the spatial distribution of receptive-field properties across the horizontal extent of the cortical tissue, and (4) how the spatial distributions of different receptive-field properties interact with one another. We show how these data are generally well explained by the theory of input-driven self-organization, with a family of computational models of cortical maps offering a parsimonious account for a wide range of map-related phenomena. We then discuss important challenges to this explanation, with respect to the maps present at birth, maps present under activity blockade, the limits of adult plasticity, and the lack of some maps in rodents. Because there is not at present another credible general theory for cortical map development, we conclude by proposing key experiments to help uncover other mechanisms that might also be operating during map development

Crossref

White Rose Research Online

Learning to Extract Motion from Videos in Convolutional Neural Networks

Author: BKP Horn
D Fleet
D Fortun
DJ Butler
DJ Heeger
EH Adelson
F Solari
GW Taylor
KG Derpanis
KG Derpanis
NC Rust
T Brox
T Brox
V Ulman
YA LeCun
Publication venue
Publication date: 27/01/2016
Field of study

This paper shows how to extract dense optical flow from videos with a convolutional neural network (CNN). The proposed model constitutes a potential building block for deeper architectures to allow using motion without resorting to an external algorithm, \eg for recognition in videos. We derive our network architecture from signal processing principles to provide desired invariances to image contrast, phase and texture. We constrain weights within the network to enforce strict rotation invariance and substantially reduce the number of parameters to learn. We demonstrate end-to-end training on only 8 sequences of the Middlebury dataset, orders of magnitude less than competing CNN-based motion estimation methods, and obtain comparable performance to classical methods on the Middlebury benchmark. Importantly, our method outputs a distributed representation of motion that allows representing multiple, transparent motions, and dynamic textures. Our contributions on network design and rotation invariance offer insights nonspecific to motion estimation

arXiv.org e-Print Archive

Crossref

Feature pyramid transformer

Author: EH Adelson
K He
LC Chen
M Everingham
MD Zeiler
T-Y Lin
W Liu
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/07/2020
Field of study

Feature interactions across space and scales underpin modern visual recognition systems because they introduce beneficial visual contexts. Conventionally, spatial contexts are passively hidden in the CNN's increasing receptive fields or actively encoded by non-local convolution. Yet, the non-local spatial interactions are not across scales, and thus they fail to capture the non-local contexts of objects (or parts) residing in different scales. To this end, we propose a fully active feature interaction across both space and scales, called Feature Pyramid Transformer (FPT). It transforms any feature pyramid into another feature pyramid of the same size but with richer contexts, by using three specially designed transformers in self-level, top-down, and bottom-up interaction fashion. FPT serves as a generic visual backbone with fair computational overhead. We conduct extensive experiments in both instance-level (i.e., object detection and instance segmentation) and pixel-level segmentation tasks, using various backbones and head networks, and observe consistent improvement over all the baselines and the state-of-the-art methods.Comment: Published at the European Conference on Computer Vision, 202

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

Accidental Pinhole and Pinspeck Cameras

Author: A Criminisi
AL Cohen
Antonio Torralba
AT Young
C Liu
EH Adelson
EH Land
J Canny
J O’Brien
K Nishino
M Minnaert
MF Tappen
S Baker
William T. Freeman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2013
Field of study

We identify and study two types of “accidental” images that can be formed in scenes. The first is an accidental pinhole camera image. The second class of accidental images are “inverse” pinhole camera images, formed by subtracting an image with a small occluder present from a reference image without the occluder. Both types of accidental cameras happen in a variety of different situations. For example, an indoor scene illuminated by natural light, a street with a person walking under the shadow of a building, etc. The images produced by accidental cameras are often mistaken for shadows or interreflections. However, accidental images can reveal information about the scene outside the image, the lighting conditions, or the aperture by which light enters the scene.National Science Foundation (U.S.) (CAREER Award 0747120)United States. Office of Naval Research. Multidisciplinary University Research Initiative (N000141010933)National Science Foundation (U.S.) (CGV 1111415)National Science Foundation (U.S.) (CGV 0964004

DSpace@MIT

Crossref

Springer - Publisher Connector

Spike to spike MT model and applications

Author: E Simoncelli
EH Adelson
J Perge
Maria-Jose Escobar
MJ Escobar
Pierre Kornprobst
T Sejnowski
Thierry Vieville
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Vehicle Re-identification in Context

Author: A Newell
EH Adelson
H Wang
M Li
N Martinel
RS Feris
S Gong
T Wang
X Liu
X Zhu
YC Chen
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

© 2019, Springer Nature Switzerland AG. Existing vehicle re-identification (re-id) evaluation benchmarks consider strongly artificial test scenarios by assuming the availability of high quality images and fine-grained appearance at an almost constant image scale, reminiscent to images required for Automatic Number Plate Recognition, e.g. VeRi-776. Such assumptions are often invalid in realistic vehicle re-id scenarios where arbitrarily changing image resolutions (scales) are the norm. This makes the existing vehicle re-id benchmarks limited for testing the true performance of a re-id method. In this work, we introduce a more realistic and challenging vehicle re-id benchmark, called Vehicle Re-Identification in Context (VRIC). In contrast to existing vehicle re-id datasets, VRIC is uniquely characterised by vehicle images subject to more realistic and unconstrained variations in resolution (scale), motion blur, illumination, occlusion, and viewpoint. It contains 60,430 images of 5,622 vehicle identities captured by 60 different cameras at heterogeneous road traffic scenes in both day-time and night-time. Given the nature of this new benchmark, we further investigate a multi-scale matching approach to vehicle re-id by learning more discriminative feature representations from multi-resolution images. Extensive evaluations show that the proposed multi-scale method outperforms the state-of-the-art vehicle re-id methods on three benchmark datasets: VehicleID, VeRi-776, and VRIC (Available at http://qmul-vric.github.io )

Crossref

Queen Mary Research Online